Probabilistic Latent Maximal Marginal Relevance

ثبت نشده
چکیده

Diversity has been heavily motivated in the information retrieval literature as an objective criterion for result sets in search and recommender systems. Perhaps one of the most well-known and most used algorithms for result set diversication is that of Maximum Marginal Relevance (MMR). In this paper, we show that while MMR is somewhat adhoc and motivated from a purely pragmatic perspective, we can derive a more principled variant via probabilistic inference in a latent variable graphical model. This novel derivation presents a formal probabilistic latent view of MMR (PLMMR) that (a) removes the need to manually balance relevance and diversity parameters, (b) shows that specific definitions of relevance and diversity metrics appropriate to MMR emerge naturally, and (c) formally derives variants of latent semantic indexing (LSI) similarity metrics for use in PLMMR. Empirically, PLMMR outperforms MMR with standard term frequency based similarity and diversity metrics since PLMMRmaximizes latent diversity in the results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Query-Focused Multidocument Summarization Based on Hybrid Relevance Analysis and Surface Feature Salience

Query-focused multidocument summarization is to synthesize from a set of topic-related documents a brief, well-organized, fluent summary for the purpose of answering an information need that cannot be met by just stating a name, date, quantity, etc. In this paper, the task is essentially treated as a sentence retrieval task. We propose a hybrid relevance analysis to evaluate the relevance of a ...

متن کامل

Summarizing Relevant Information for Question-Answering Using Hybrid Relevance Analysis and Surface Feature Salience

Much research for question-answering aims to answer factoid, definitional and biographical questions. In most cases, the answers are given as a name, date, quantity, and so on. In this paper, we try to merge techniques of multidocument summarization and question-answering to generate a brief, well-organized fluent summary to provide more relevant information for the purpose of answering real-wo...

متن کامل

Extractive summarization of meeting recordings

Several approaches to automatic speech summarization are discussed below, using the ICSI Meetings corpus. We contrast feature-based approaches using prosodic and lexical features with maximal marginal relevance and latent semantic analysis approaches to summarization. While the latter two techniques are borrowed directly from the field of text summarization, feature-based approaches using proso...

متن کامل

Spike and Slab Gaussian Process Latent Variable Models

The Gaussian process latent variable model (GPLVM) is a popular approach to non-linear probabilistic dimensionality reduction. One design choice for the model is the number of latent variables. We present a spike and slab prior for the GP-LVM and propose an efficient variational inference procedure that gives a lower bound of the log marginal likelihood. The new model provides a more principled...

متن کامل

TREC 2010 Blog Track: Top Stories Identification

This paper describes our participation in the TREC 2010 Blog Track. For the Top Stories Identification Task, we explore the relationship among news events, news stories and blog posts. We first extract important news events from the TRC2 corpus using a probabilistic mixture model. Then, we propose a probabilistic approach to identify top news stories. Furthermore, we use an additional feature t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010